Search CORE

75 research outputs found

Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Data Access

Author: D Yan
D Yan
HN Gabow
JY Halpern
LG Valiant
M Lesniak
M Xie
S Salihoglu
Y Tian
Publication venue
Publication date: 06/09/2017
Field of study

Pregel is a popular distributed computing model for dealing with large-scale graphs. However, it can be tricky to implement graph algorithms correctly and efficiently in Pregel's vertex-centric model, especially when the algorithm has multiple computation stages, complicated data dependencies, or even communication over dynamic internal data structures. Some domain-specific languages (DSLs) have been proposed to provide more intuitive ways to implement graph algorithms, but due to the lack of support for remote access --- reading or writing attributes of other vertices through references --- they cannot handle the above mentioned dynamic communication, causing a class of Pregel algorithms with fast convergence impossible to implement. To address this problem, we design and implement Palgol, a more declarative and powerful DSL which supports remote access. In particular, programmers can use a more declarative syntax called chain access to naturally specify dynamic communication as if directly reading data on arbitrary remote vertices. By analyzing the logic patterns of chain access, we provide a novel algorithm for compiling Palgol programs to efficient Pregel code. We demonstrate the power of Palgol by using it to implement several practical Pregel algorithms, and the evaluation result shows that the efficiency of Palgol is comparable with that of hand-written code.Comment: 12 pages, 10 figures, extended version of APLAS 2017 pape

arXiv.org e-Print Archive

Crossref

Finding 2-Edge and 2-Vertex Strongly Connected Components in Quadratic Time

Author: AL Buchsbaum
GF Italiano
H Nagamochi
H Nagamochi
HN Gabow
HN Gabow
J Bang-Jensen
JA Bondy
JE Hopcroft
K Chatterjee
L Georgiadis
M Henzinger
MR Henzinger
RE Tarjan
RE Tarjan
S Even
S Makino
Z Galil
Z Galil
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/05/2015
Field of study

We present faster algorithms for computing the 2-edge and 2-vertex strongly connected components of a directed graph, which are straightforward generalizations of strongly connected components. While in undirected graphs the 2-edge and 2-vertex connected components can be found in linear time, in directed graphs only rather simple

O(m n)

-time algorithms were known. We use a hierarchical sparsification technique to obtain algorithms that run in time

O(n^2)

. For 2-edge strongly connected components our algorithm gives the first running time improvement in 20 years. Additionally we present an

O(m^2 / \log{n})

-time algorithm for 2-edge strongly connected components, and thus improve over the

O(m n)

running time also when

m = O(n)

. Our approach extends to k-edge and k-vertex strongly connected components for any constant k with a running time of

O(n^2 \log^2 n)

for edges and

O(n^3)

for vertices

arXiv.org e-Print Archive

Crossref

Polynomial algorithms for the Maximal Pairing Problem: efficient phylogenetic targeting on arbitrary trees

Author: A Purvis
C Arnold
C Arnold
C Arnold
Christian Arnold
CJ Vinyard
CL Nunn
DD Ackerly
E Rothenberg
H Gabow
HN Gabow
HN Gabow
J Felsenstein
JG Burleigh
JS McCaskill
MJ Sanderson
NB Goodwin
NLR Poff
OR Bininda-Emonds
P Steffen
Peter F Stadler
U Mückstein
WP Maddison
Z Galil
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: The Maximal Pairing Problem (MPP) is the prototype of a class of combinatorial optimization problems that are of considerable interest in bioinformatics: Given an arbitrary phylogenetic tree T and weights ωxy for the paths between any two pairs of leaves (x, y), what is the collection of edge-disjoint paths between pairs of leaves that maximizes the total weight? Special cases of the MPP for binary trees and equal weights have been described previously; algorithms to solve the general MPP are still missing, however. Results: We describe a relatively simple dynamic programming algorithm for the special case of binary trees. We then show that the general case of multifurcating trees can be treated by interleaving solutions to certain auxiliary Maximum Weighted Matching problems with an extension of this dynamic programming approach, resulting in an overall polynomial-time solution of complexity (n^4 log n) w.r.t. the number n of leaves. The source code of a C implementation can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/Targeting. For binary trees, we furthermore discuss several constrained variants of the MPP as well as a partition function approach to the probabilistic version of the MPP. Conclusions: The algorithms introduced here make it possible to solve the MPP also for large trees with high-degree vertices. This has practical relevance in the field of comparative phylogenetics and, for example, in the context of phylogenetic targeting, i.e., data collection with resource limitations.Human Evolutionary Biolog

CiteSeerX

Crossref

Harvard University - DASH

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets

On vertex adjacencies in the polytope of pyramidal tours with step-backs

Author: CH Papadimitriou
CR Chegireddy
EF Combarro
G Sierksma
G Sierksma
H Enomoto
HN Gabow
M Grötschel
M Khachay
ML Balinski
NE Aguilera
PC Gilmore
TS Arthanari
VA Bondarenko
VA Bondarenko
VA Bondarenko
VA Bondarenko
VA Bondarenko
Vladimir Bondarenko
Y Oda
Р. Ю. Симанчев
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/02/2019
Field of study

We consider the traveling salesperson problem in a directed graph. The pyramidal tours with step-backs are a special class of Hamiltonian cycles for which the traveling salesperson problem is solved by dynamic programming in polynomial time. The polytope of pyramidal tours with step-backs

PSB (n)

is defined as the convex hull of the characteristic vectors of all possible pyramidal tours with step-backs in a complete directed graph. The skeleton of

PSB (n)

is the graph whose vertex set is the vertex set of

PSB (n)

and the edge set is the set of geometric edges or one-dimensional faces of

PSB (n)

. The main result of the paper is a necessary and sufficient condition for vertex adjacencies in the skeleton of the polytope

PSB (n)

that can be verified in polynomial time.Comment: in Englis

arXiv.org e-Print Archive

Crossref

Max-coloring paths: tight bounds and extensions

Author: AS Tanenbaum
B Escoffier
DJ Guan
HN Gabow
Julián Mestre
P Brucker
RE Tarjan
RE Tarjan
Telikepalli Kavitha
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Greedy Shortest Common Superstring Approximation in Compact Space

Author: A Zaritsky
E Ohlebusch
E Ukkonen
H Kaplan
HN Gabow
J Gallant
J Qin
J Tarhio
JS Turner
JT Simpson
S Gog
TH Cormen
V Mäkinen
Publication venue: Springer International Publishing AG
Publication date: 06/09/2017
Field of study

Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in O(n log σ) time and bits of space, where n is the total length of the input strings in characters, and σσ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly 5n log σ bits of space and reasonable time for a real dataset that consists of DNA fragments.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

An integrated approach to a combinatorial optimisation problem

Author: A Erez
F Rossi
G Chartrand
HN Gabow
J Bang-Jensen
J Bowles
JKF Bowles
JKF Bowles
L Hughes
L Moura de
M Avriel
M Lombardi
N Bjørner
R Hemmecke
S Burer
S Falke
T Nipkow
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Funding: MRC grant MR/S003819/1 and Health Data Research UK, an initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations, and leading medical research charities.We take inspiration from a problem from the healthcare domain, where patients with several chronic conditions follow different guidelines designed for the individual conditions, and where the aim is to find the best treatment plan for a patient that avoids adverse drug reactions, respects patient’s preferences and prioritises drug efficacy. Each chronic condition guideline can be abstractly described by a directed graph, where each node indicates a treatment step (e.g., a choice in medications or resources) and has a certain duration. The search for the best treatment path is seen as a combinatorial optimisation problem and we show how to select a path across the graphs constrained by a notion of resource compatibility. This notion takes into account interactions between any finite number of resources, and makes it possible to express non-monotonic interactions. Our formalisation also introduces a discrete temporal metric, so as to consider only simultaneous nodes in the optimisation process. We express the formal problem as an SMT problem and provide a correctness proof of the SMT code by exploiting the interplay between SMT solvers and the proof assistant Isabelle/HOL. The problem we consider combines aspects of optimal graph execution and resource allocation, showing how an SMT solver can be an alternative to other approaches which are well-researched in the corresponding domains.Postprin

Crossref

Lancaster E-Prints

University of St. Andrews - Pure

St Andrews Research Repository

On the Complexity of Scheduling in Wireless Networks

Author: AL Stolyar
B Hajek
B Miller
BS Baker
C Joo
C Joo
C Joo
C Joo
Changhee Joo
D Avis
DS Hochbaum
E Modiano
F Kuhn
FP Kelly
G Sharma
G Sharma
Gaurav Sharma
H Balakrishnan
H Yaïche
HB Hunt III
HN Gabow
J Gill
J Håstad
L Tassiulas
L Tassiulas
L Xiao
LJ Stockmeyer
MJ Neely
MM Halldórsson
Ness B. Shroff
P Chaporkar
Ravi R. Mazumdar
RL Cruz
S Ramanathan
S Sanghavi
SH Teng
SO Krumke
X Lin
X Lin
X Wu
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2010
Field of study

We consider the problem of throughput-optimal scheduling in wireless networks subject to interference constraints. We model the interference using a family of K-hop interference models, under which no two links within a K-hop distance can successfully transmit at the same time. For a given K, we can obtain a throughput-optimal scheduling policy by solving the well-known maximum weighted matching problem. We show that for K > 1, the resulting problems are NP-Hard that cannot be approximated within a factor that grows polynomially with the number of nodes. Interestingly, for geometric unit-disk graphs that can be used to describe a wide range of wireless networks, the problems admit polynomial time approximation schemes within a factor arbitrarily close to 1. In these network settings, we also show that a simple greedy algorithm can provide a 49-approximation, and the maximal matching scheduling policy, which can be easily implemented in a distributed fashion, achieves a guaranteed fraction of the capacity region for "all K." The geometric constraints are crucial to obtain these throughput guarantees. These results are encouraging as they suggest that one can develop low-complexity distributed algorithms to achieve near-optimal throughput for a wide range of wireless networksopen1

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

ScholarWorks@UNIST

Calculating Ensemble Averaged Descriptions of Protein Rigidity without Sampling

Author: AG Murzin
AJ Rader
AJ Rader
AJ Rader
AJ Rader
AY Istomin
B Hendrickson
B Ren
BM Hespenheide
BM Hespenheide
CY Chen
D Verma
Dennis R. Livesay
DJ Jacobs
DJ Jacobs
DJ Jacobs
DJ Jacobs
DJ Jacobs
DJ Jacobs
DJ Jacobs
Donald J. Jacobs
DR Livesay
DR Livesay
DR Livesay
DW Farrell
E Guyon
FJ Salsbury
G Laman
GD Smith
H Imai
H Tan
HN Gabow
Hui Wang
I Georgiev
Jerome Mathe
JM Mottonen
JM Mottonen
K Hwang
K Sugihara
KE Roberts
KM Frey
LC González
LH Greene
LR Ford
Luis C. González
M Lei
MC Swier
MF Thorpe
MI Zavodsky
N Fox
OK Vorov
OK Vorov
S Djordjevic
S Fulle
S Radestock
SA Wells
SA Wells
TH Cormen
TS Tay
WM Rand
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Previous works have demonstrated that protein rigidity is related to thermodynamic stability, especially under conditions that favor formation of native structure. Mechanical network rigidity properties of a single conformation are efficiently calculated using the integer body-bar Pebble Game (PG) algorithm. However, thermodynamic properties require averaging over many samples from the ensemble of accessible conformations to accurately account for fluctuations in network topology. We have developed a mean field Virtual Pebble Game (VPG) that represents the ensemble of networks by a single effective network. That is, all possible number of distance constraints (or bars) that can form between a pair of rigid bodies is replaced by the average number. The resulting effective network is viewed as having weighted edges, where the weight of an edge quantifies its capacity to absorb degrees of freedom. The VPG is interpreted as a flow problem on this effective network, which eliminates the need to sample. Across a nonredundant dataset of 272 protein structures, we apply the VPG to proteins for the first time. Our results show numerically and visually that the rigidity characterizations of the VPG accurately reflect the ensemble averaged properties. This result positions the VPG as an efficient alternative to understand the mechanical role that chemical interactions play in maintaining protein stability

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Assessing the functional coherence of modules found in multiple-evidence networks from Arabidopsis

Abstract Background Combining multiple evidence-types from different information sources has the potential to reveal new relationships in biological systems. The integrated information can be represented as a relationship network, and clustering the network can suggest possible functional modules. The value of such modules for gaining insight into the underlying biological processes depends on their functional coherence. The challenges that we wish to address are to define and quantify the functional coherence of modules in relationship networks, so that they can be used to infer function of as yet unannotated proteins, to discover previously unknown roles of proteins in diseases as well as for better understanding of the regulation and interrelationship between different elements of complex biological systems. Results We have defined the functional coherence of modules with respect to the Gene Ontology (GO) by considering two complementary aspects: (i) the fragmentation of the GO functional categories into the different modules and (ii) the most representative functions of the modules. We have proposed a set of metrics to evaluate these two aspects and demonstrated their utility in <it>Arabidopsis thaliana</it>. We selected 2355 proteins for which experimentally established protein-protein interaction (PPI) data were available. From these we have constructed five relationship networks, four based on single types of data: PPI, co-expression, co-occurrence of protein names in scientific literature abstracts and sequence similarity and a fifth one combining these four evidence types. The ability of these networks to suggest biologically meaningful grouping of proteins was explored by applying Markov clustering and then by measuring the functional coherence of the clusters. Conclusions Relationship networks integrating multiple evidence-types are biologically informative and allow more proteins to be assigned to a putative functional module. Using additional evidence types concentrates the functional annotations in a smaller number of modules without unduly compromising their consistency. These results indicate that integration of more data sources improves the ability to uncover functional association between proteins, both by allowing more proteins to be linked and producing a network where modular structure more closely reflects the hierarchy in the gene ontology.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Rothamsted Repository